AI029

Reinforcement Learning: An Introduction

Temporal-Difference Learning

Lecture

Lesson 6

Date

2026-04-21

Teacher

AI Tutor

Duration

60 Mins

Learning Objectives

Define the TD(0) update rule and its relation to the Bellman equation.
Contrast TD learning with Monte Carlo methods regarding bias, variance, and online updates.
Explain the concept of bootstrapping and its role in TD prediction.
Introduce the Sarsa (on-policy) and Q-learning (off-policy) algorithms for control.
Analyze the advantages of TD learning in environments without a transition model.